Applications of Information Geometry to Audio Signal Processing

نویسندگان

  • Arnaud Dessein
  • Arshia Cont
چکیده

In this talk, we present some applications of information geometry to audio signal processing. We seek a comprehensive framework that allows to quantify, process and represent the information contained in audio signals. In digital audio, a sound signal is generally encoded as a waveform, and a common problematic is to extract relevant information about the signal by computing sound features from this waveform. A key issue in this context is then to bridge the gap between the raw signal or low-level features (e.g. attack time, frequency content), and the symbolic properties or high-level features (e.g. speaker, instrument, music genre). We address this issue by employing the theoretical framework of information geometry. In general terms, information geometry is a field of mathematics that studies the notions of probability and of information by the way of differential geometry [1]. The main idea is to analyze the geometrical structure of differential manifold owned by certain families of probability distributions which form a statistical manifold. We aim to investigate the intrinsic geometry of families of probability distributions that represent audio signals, and to manipulate informative entities of sounds within this geometry. We focus on the statistical manifolds related to exponential families. Exponential families are parametric families of probability distributions that encompass most of the distributions commonly used in statistical learning. Moreover, exponential families equipped with the dual exponential and mixture affine connections possess two dual affine coordinate systems, respectively the natural and the expectation parameters. The underlying dually flat geometry exhibits a strong Hessian dualistic structure, induced by a twice differentiable convex function, called potential, together with its Legendre-Fenchel conjugate. This geometry generalizes the standard self-dual Euclidean geometry, with two dual Bregman divergences instead of the self-dual Euclidean distance, as well as dual geodesics, a generalized Pythagorean theorem and dual projections. However, the Bregman divergences are generalized distances that are not symmetric and do not verify the triangular inequality in general. From a computational viewpoint, several machine learning algorithms that rely on strong metric properties possessed by the Euclidean distance are therefore not suitable anymore. Yet, recent works have proposed to generalize some of these algorithms to the case of exponential families and of their associated Bregman

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on digital data hiding schemes: principals, algorithms, and applications

This paper investigates digital data hiding schemes. The concept of information hiding will be explained at first, and its traits, requirements, and applications will be described subsequently. In order to design a digital data hiding system, one should first become familiar with the concepts and criteria of information hiding. Having knowledge about the host signal, which may be audio, image, ...

متن کامل

A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)

Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...

متن کامل

Implementation of the direction of arrival estimation algorithms by means of GPU-parallel processing in the Kuda environment (Research Article)

Direction-of-arrival (DOA) estimation of audio signals is critical in different areas, including electronic war, sonar, etc. The beamforming methods like Minimum Variance Distortionless Response (MVDR), Delay-and-Sum (DAS), and subspace-based Multiple Signal Classification (MUSIC) are the most known DOA estimation techniques. The mentioned methods have high computational complexity. Hence using...

متن کامل

Design and Implementation of Digital Demodulator for Frequency Modulated CW Radar (RESEARCH NOTE)

Radar Signal Processing has been an interesting area of research for realization of programmable digital signal processor using VLSI design techniques. Digital Signal Processing (DSP) algorithms have been an integral design methodology for implementation of high speed application specific real-time systems especially for high resolution radar. CORDIC algorithm, in recent times, is turned out to...

متن کامل

Parleda: a Library for Parallel Processing in Computational Geometry Applications

ParLeda is a software library that provides the basic primitives needed for parallel implementation of computational geometry applications. It can also be used in implementing a parallel application that uses geometric data structures. The parallel model that we use is based on a new heterogeneous parallel model named HBSP, which is based on BSP and is introduced here. ParLeda uses two main lib...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011